Temporal Correlation and Changepoints
We begin this week by finishing our notes on additive models in R, and then move onto temporal correlation and changepoints.
1 Additive Models in R
1.1 Fitting Additive Models in R
We can fit GAMs in R using the package mgcv, which was designed to allow extensions of generalised linear models (GLMs). The generalised aspect means that we can also extend the standard additive model to situations where we have non-normal responses, but we will not focus on these in this course.
We use the function gam() to fit our model. This works in a very similar manner to the lm() function. The smooth functions are represented by s(). These use the penalised splines approach described last week. Any linear terms can be included additively as normal.
The model will take the form below, where you can include as many smooth or linear terms as you wish.
1.2 Visualising additive models in R
Unlike linear terms, we can’t simply report parameter estimates for smooth terms in additive models. We can instead simply use the plot function to visualise our smooth functions.
2 Autocorrelation
We already know that environmental data are often measured over time, and that consecutive measurements are often related. This relationship between adjacent observations is known as autocorrelation. The term autocorrelation literally means correlation with oneself. Here, we can think of it as each point being correlated with “previous” versions of itself.
The strength of autocorrelation tends to be related to how far apart points are in time (known as lag). Points closer together have more in common than those further apart.
Many statistical models rely on an assumption that our observations (more specifically our error terms) are independent. If we have correlation then each observation “shares” some information with other observations. This means we have less independent information within our dataset and the effective sample size of the dataset will decrease. When we are calculate standard errors, confidence intervals etc., we are using the “wrong” value of \(n\). This can lead to us underestimating the variance and being overconfident in our results.
2.1 Autocorrelation function
We can estimate the strength of temporal dependence using a sample autocorrelation function (ACF). This function represents the autocorrelation of the data at a series of different lags in time. Assuming we have a regularly spaced time series, we compute the sample ACF at lag \(k\) as:
\[r(k) = \frac{\sum_{t=k+1}^n(x_t - \bar{x})(x_{t-k} - \bar{x})}{\sum_{t=1}^n(x_t - \bar{x})^2}\]
We compute this for values from \(k=1,\ldots,K\) where \(K\) is some sensible maximum lag.
Our sample ACF is an estimate of the overall ACF, and as such we have to consider uncertainty. Typically we will compute a simple confidence interval around our point estimate at each lag as
\[r(k) \pm 1.96\sqrt{\frac{1}{n}}\]
where \(n\) is the number of observations in the time series.
We plot the ACF function with a separate vertical line representing the size correlation at each lag, and dashed lines for the confidence intervals. If the lines lie within the confidence intervals, no autocorrelation is present.
Here, we have several lines outside the confidence intervals, so autocorrelation exists in this dataset.
Each of the three ACFs below are examples of cases where suggest autocorrelation is present.
- The first has a repeating pattern — suggests seasonality:
- The second has a decreasing pattern — likely caused by trend:
- The third has both patterns — probably seasonality AND trend:
If we have identified autocorrelation in our data, we have to find a way to account for it in our model. In some cases we may choose to simply treat it as a nuisance, and make adjustments to our standard errors to reflect the reduced effective sample size.
The alternative is to explicitly account for the autocorrelation in our model. For seasonal patterns, we may be able to eliminate it using methods discussed previously, such as harmonics. For other types of autocorrelation, we may use approaches such as autoregressive integrated moving average (ARIMA).
2.2 Autocorrelation models
Autoregressive integrated moving average (ARIMA) models are a general class of models which account for autocorrelation. These models combine aspects of two main classes of model: autoregressive (AR) and moving average (MA).
Broadly speaking, AR(\(p\)) models assume that the current value is a function of the previous \(p\) observations. In contrast, MA(\(q\)) models assume that the current value can be computed by a linear regression on the \(q\) previous random error terms. These models are covered in more detail in the Time Series course, but will be addressed briefly here.
2.2.1 AR model
An autoregressive (AR) model accounts for correlation by describing each value as a function of the previous values. The AR(\(p\)) process can be written as
\[X_t = \sum_{i=1}^p \phi_i X_{t-i} + \epsilon_t.\]
Here \(\phi_i\) is the “autoregressive parameter” which measures the strength of the autocorrelation. \(\epsilon_t \sim \mbox{N}(0, \sigma^2)\) is simply random error, often referred to as noise.
2.2.2 MA model
A moving average (MA) model accounts for correlation by describing each value as a function of the previous set of error terms. The MA(\(q\)) process can be written as
\[X_t = \mu + \sum_{i=1}^q \theta_i \epsilon_{t-i} + \epsilon_t.\]
Here \(\mu\) is the mean of the series and \(\theta_i\) is the regression parameter associated with the \(i\)th lag.
2.2.3 ARIMA
The ARIMA is a combination of AR and MA processes. The I stands for Integrated, which relates to “differencing”, i.e., replacing a value with the difference between itself and the previous value. We write this model as ARIMA(\(p,d,q\)), where \(p\) is the order of the AR process, \(d\) is the degree of differencing and \(q\) is the order of the MA.
For example, ARIMA(1,0,0) would be equivalent to an AR(1) model and ARIMA(0,0,1) is an MA(1) model.
We can use the sample ACF to suggest the appropriate model to account for our autocorrelation. A smooth decay suggests that we have AR components. A less structured ACF might suggest an MA is more appropriate. In practice, AR processes are less complex than MA and tend to be used more frequently as a result.
ARIMA methods are all based on regularly spaced data (measurements equally spaced in time). However, in some cases we may have irregularly spaced data. If the data are roughly regular (just a small deviation here and there), we may be able to treat them as though they are regular. If we have missing data, we may be able to impute or interpolate without too many issues. In cases where we have completely irregular data, we may need to use more complex statistical methods (which will not be covered in this course).
2.2.4 ARIMA in R
We can use the arima() function in R to explore autocorrelation. We must first fit a linear model, and then extract the design matrix to use as an input to this function. For example, to fit an AR(1) model we would use the following code:
Code
trend.model0 <- lm(response ~ decimal.date)
X <- model.matrix(trend.model0)
trend.model1 <- arima(y, order = c(1, 0, 0), xreg = X,
include.mean = FALSE)3 Changepoints
One of the main reasons we analyze environmental data is to detect changes. Sometimes these changes occur organically, either as the result of some natural environmental process, or some non-deliberate human action. In some other occasions these changes occur by design, as the result of a deliberate and controlled human action (e.g., policy changes). Regardless of the reason for the change, we want to understand more about when it happened and the extent of the change.
In statistics, a changepoint is a point in time after which some or all of the model parameters might change. Most commonly this is a change in mean or variance, but it could also be a change in some other feature of the data. We may not always know exactly when the changepoint occurs or whether we have a changepoint at all. In some cases we may have more than one changepoint.
Reasons for changepoints might include:
- Environmental events, e.g. flooding, volcanic eruption.
- Policy, e.g. low emissions zones, water quality regulations.
- Changes to measuring equipment.
Some simple examples of changepoints include:
- A shift up (or down) of the mean.
- A short-term change in the mean.
- A change in a model parameter, eg slope.
Consider a series with two different mean levels. The first 20 observations come from \(\mbox{N}(10,1)\) and the next 20 observations come from \(\mbox{N}(15,1)\). Our ability to detect this change depends on the size of the change and the variability in the data. The plots below illustrate these data and a possible model for these.
It can be difficult to distinguish changepoints from trend. The plots below illustrate how the magnitude of the shift in mean value can affect our ability to identify the shift in mean. The bottom right plot illustrates the additional effect of a change in variance.
3.1 Known changepoint
Sometimes it is known that a change occurred at a specific timepoint, but the magnitude or shape of this change are not known.
3.1.1 Known changepoint — mean shift
Suppose that we have a series of data \(Y_i\) collected at a set of timepoints \(t_i\) with \(i=1,\ldots, n\). If our known changepoint is at time \(c\), then we can construct an indicator function \[\mathcal{I}_{t_i} = \begin{cases} 0\hspace{0.5cm} \text{if } \hspace{0.1cm} t_i < c\\ 1\hspace{0.5cm} \text{if } \hspace{0.1cm} t_i \geq c \end{cases}\] This can then be included as a parameter in our regression model \[Y_i = \beta_0 + \varphi\mathcal{I}_{t_i} + \epsilon_i\]
Here, \(\varphi\), the coefficient of the indicator function ,can be described as the intervention effect. If this parameter is significant in our model, that implies that we have a significant change in mean at timepoint \(c\).
3.1.2 Known changepoint — change in slope
We also need to consider examples where we observe a change in slope at a known timepoint.
It would be possible to fit two separate regressions. However, this seems quite simplistic, and it would be better to have a single continuous model.
We want our regression to be continuous at \(c\) such that we have \[\alpha_1 + \beta_1 c = \alpha_2 + \beta_2 c\] This can be rewritten in terms of a single model parameter, as \[\alpha_2 = \alpha_1 + c(\beta_1 - \beta_2)\] We can thus update our equations to the following, which is known as piecewise regression (or segmented regression):
\[\begin{align*} Y_i &= \alpha_1 + \beta_1 x_i + \epsilon_i\hspace{16mm} &\mbox{ for } x<c\\ Y_i &= \alpha_1 + (\beta_1 - \beta_2)c + \beta_2 x_i + \epsilon_i\hspace{2mm} &\mbox{ for } x\geq c \end{align*}\]
(Note that this could be expressed as a single model using our indicator function.)
The two linear parts of our model now meet at \(c\). Note that our piecewise model is more efficient than two separate regressions, since it uses one fewer parameter.
In many cases, we may have more complex changes to our trend. There are a variety of more advanced models for known changepoints, but these are all based on the same underlying principles. For example, the bent cable model allows for an extended “transition phase” between the two slopes, often represented by a smooth curve.
3.2 Unknown changepoint
It can be more challenging to fit a changepoint model when you don’t clearly know exactly when the change occurred. One of the most popular methods is an iterative approach which searches across the entire range of our data for possible changepoints.